Capstone Project - The Battle of Neighborhoods (Week 2)

Coursera Course: Applied Data Science Capstone

Table of contents

Introduction

In this capstone project, I am utilizing the skills and tools I learnt from Coursera courses. I have selected New York city for the project. I am helping the stakeholders to narrow the location for their new business.

Business Problem

In this project, we are assisting a big Indian restaurant chain to open a new restaurant on a foreign land. Currently, our stakeholders own around 200 restaurants in India and they want to expand their business by opening a new and the first restaurant in the New york City. We have to find a solution for stakeholder to open Indian restaurant chain in the city New york, USA.

Since there are lots of restaurants in New York city, we will find the locations that are not crowded with Indian restaurants and we are also interested in areas with less Indian restaurants.

The stakeholders are not only interested in the location to open new chains; they are also interested in the place where they can make good profit. So, we are helping them to find a location where the areas are crowded with other categories like Art & Entertainment, College & Universities, and Profession offices.

Data

There are five boroughs in New York. We are looking at the number of Indian restaurants in the all five boroughs. Based on the count and crowd of the restaurants, we can make decision on the location to open new restaurant.

Following factors will influence our decision:

Following data sources will be needed to extract/generate the required information:

Importing the requried libraries

Boroughs of New York city

The New York city has five boroughs.

  1. The Bronx
  2. Brooklyn
  3. Manhattan
  4. Queens
  5. Staten Island

The more info about the borough can be found in the Wikipedia.

Let's first find the latitude & longitude of all the five boroughs using geopy library.

Based on the information from Wikipedia and a visual observation, The radius for all the Borough will be specified in the dataframe.

Using the above co-ordinates, a circle will be created for all the boroughs. This will help us to understand the search radius for the restarurants.

Looks like we are covering almost all the area of boroughs. Though it looks like few are overlapping, we can remove those duplicate data later using the borough and zipcode data.

Postal codes and Neighbourhood details of Boroughs

Using the Department of Health, NY, we will collect all the boroughs and the neighborhoods data for all pincodes of NY city.

Indian Restaurants in all 5 Boroughs using FourSquare API

Now, we will go through all the five boroughs one by one and check for the Indian Restaurants count.

The below credential data are removed for security purpose.

Borough : Bronx

We are including few more columns to understand better about the venues.

We gonna need few columns for the analysis.

Now, we will plot those data on the map and see how they look.

Looks like the borough "Bronx" contains very few Indian Restaurants. Our stakeholder might be able to open new restaurants here.

Borough : Brooklyn

We gonna need few columns for the analysis.

Now, we will plot those data on the map and see how they look.

Looks like Brooklyn has more restaurants compared to Bronx. Our clients might face some challenges here.

Borough : Manhattan

We are Including few more columns to understand better about the venues.

We gonna need few columns for the analysis.

Now, we will plot those data on the map and see how they look.

Woah! Manhattan has lot Indian restaurants. Our stakeholder will face lot of challenges here.

Borough : Queens

We gonna need few columns for the analysis.

Now, we will plot those data on the map and see how they look.

Looks like Queen also has more Indian restaurants. But they all are crowded at one or two location. Our stakeholder will have a chance to start their business here.

Borough : Staten Island

We gonna need few columns for the analysis.

Now, we will plot those data on the map and see how they look.

Hmmm. Staten Island has very few restaurants. Our stakeholder should be careful here. This place might not be right place to open a new business. We are not sure about this place. May be not many people not liked Indian cuisine. We may have to follow different strategy here. We will take look at this in another way.

Combined data of all boroughs

Totally, there are 520 indian restaurants in the NY city.

Looking at the above map, we can that Brooklyn and Queens have good oportunity for a new restaurants. They are not crowded by Indian restaurants.

Methodology

In this project, our aim is to detect areas that have low Indian restaurant density and good place for the start of new restaurant that can bring good profit for the stakeholders.

In the first step, we have collected the all indian restaurants for each borough using foursquare API.

In the second step, we have combined all the boroughs data into a single dataframe and displayed them on the map. In this step, we got the picture of Indian restaurants density in the New York city.

In the third step, we will foucs on the creating clusters using unsupervised learning algorithm (k-means). The clusters will give an idea about density of indian restaurants in all areas. We will avoid those areas and we will focus on the area with less densed with indian restarants.

In the last step, we will look for other business in the area with less dense Indian restarants. Because opening a new business at unknown location is not a good business strategy. Before we jump to conclusion on the place to open for business, we will take a look at the venues of categories like Arts & Entertainment, College & University, and Professional & Other Places. We can suggest our stakeholder to open the business at these places.

Analysis

Let's perform some basic explanatory data analysis and derive some additional info from our raw data.

First we will count the number of restaurants in each borough.

Just from the look at the map and the above table, we can say that our stakeholders will face lot challenges in Manhattan. Staten Island and Bronx have very less Indian restaurants. We have to take other factors here, so we may have to little more analysis on these places. Hence, we will not consider Manhattan, Bronx, and Staten Island in our further analysis

In further analysis, we will concentrate on Brooklyn and Queens. We will apply K-Means on the indian restaurants at Brooklyn and Queen to understand the density of indian restaurants and other attractive business places.

K-Means on Brooklyn

Our stakeholder should avoid the area where densed indian restaurants are present. They will face lot competetion with existing restaurants. They can select the places with less densed Indian restaurants.

Now we look for some of the 'Arts & Entertainment' category venues in the Brooklyn borough.

Indian Restaurants
Arts & Entertainment

Looks like the first cluster in the top has less Indian restarants and more "Arts & Entertainment" category venues. Our stakeholder can open there bussiness near to one of the the "Arts & Entertainment" venues.

Now we look for some of the 'College & University' category venues in the Brooklyn borough.

Indian Restaurants
College & University

Looks like the last cluster in the bottom has less Indian restarants and more "College & University" category veneus. Our stakeholder can open there bussiness near to one of the the "College & University" venues.

Now we look for some of the 'Professional & Other Places' category venues in the Brooklyn borough.

Indian Restaurants
Professional & Other Places

Looks like the last cluster in the bottom-right has more Indian restarants and more "Professional & Other Places" category veneus. Our stakeholder can open there bussiness near to one of the the "Professional & Other Places" venues.

K-Means on Queens

Our stakeholder should avoid the area where densed Indian restaurants are present. They will face lot competetion with existing restaurants. They can select the places with less densed Indian restaurants.

Now we look for some of the 'Arts & Entertainment' category venues in the Queens.

Indian Restaurants
Arts & Entertainment

Hmm! This looks little challenging for our stakeholders.

Now we look for some of the 'College & University' category venues in the Queens borough.

Indian Restaurants
College & University

Hmm! This looks little challenging for our stakeholders.

Now we look for some of the 'Professional & Other Places' category venues in the Queens borough.

Indian Restaurants
Professional & Other Places

Hmm! This looks little challenging for our stakeholders.

Results and Discussion

Our analysis shows that although there is a great number of Indian restaurants in New York city. Manhattan has higher number of Indian restaurants and restaurants in densed locations compared to other boroughs. Open a restaurant in Manhattan is challenging. Our stakeholder will face lot challenges. They have compete again existing indian restaurants.

Bronx and Staten Island have very less Indian restaurants. We have to consider some other factors here before open a restaurants. So we will ignoring these places for now.

Though Brooklyn and Queen has good number of restaurants. Compared to the square area, they still have less than Manhattan. In most of the locations, Indian restaurants are not densed. Using K-Means clustering, where we get to know about the density of the indian restaurants. To open a new restaurants, our stakeholders have the following options.

Purpose of this analysis was to only provide info on areas that are not crowded with existing indian restaurants. It is also possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

Conclusion

Purpose of this project was to identify a location for a new Indian restaurant in order to help our stakeholders in narrowing down the search for optimal location for a new Indian restaurant. By finding restaurant density from Foursquare data, we have identified boroughs that justify further analysis (Brooklyn and Queen). Clustering helped use to understand about the areas of broroughs. We also looked for other important category densed areas where Indian restaurants are less and our stakeholders have good opertunity to open a new restaurants.

Final decission on restaurant location will be made by stakeholders based on characteristics of neighborhoods and taking additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.